Chronic illnesses like diabetes and heart disease continue to place a heavy burden on global health, making early detection more important than ever. This study introduces an artificial intelligence framework designed to predict the risk of both conditions while keeping user data private. Using Random Forest ensemble learning, the system analyzes two well-known datasets: the PIMA Indians Diabetes dataset and the Cleveland Heart Disease dataset. To ensure reliable results, the researchers applied stratified train-test splitting and five-fold cross-validation.The diabetes model reached an accuracy of nearly 76% with a strong ROC-AUC score of 0.813, while the heart disease model performed even better, achieving over 81% accuracy and a ROC-AUC of 0.947. Importantly, the framework doesn’t just provide predictions—it also highlights which features matter most, aligning with established medical risk factors. This makes the system more transparent and clinically meaningful.Compared to traditional methods like Logistic Regression and Decision Trees, the ensemble approach proved more robust. By processing inputs locally, the framework ensures privacy while offering dependable insights. Overall, the result of this research shows how AI can support preventive healthcare, empowering individuals and clinicians to act early and reduce long-term complications.
Introduction
Non-communicable diseases such as diabetes and heart disease are major global health threats, making early detection and prevention essential. Traditional diagnosis relies on manual assessment, which can be slow and subjective, while Artificial Intelligence (AI) and Machine Learning (ML) offer faster, data-driven and more accurate predictions.
This research proposes a unified framework that predicts both diabetes and heart disease risks using the Random Forest algorithm. The model improves accuracy, reduces overfitting, and enhances interpretability through feature importance analysis. It also ensures privacy by processing data locally. The system is evaluated against Logistic Regression and Decision Tree models, showing more stable and balanced performance.
The study uses two benchmark datasets: the PIMA Indians Diabetes dataset and the Cleveland Heart Disease dataset. Preprocessing includes data cleaning, stratified splitting, and cross-validation. Results show strong performance, with higher accuracy and ROC-AUC for heart disease prediction compared to diabetes. Key predictors identified include glucose, BMI, and age for diabetes, and chest pain type, thalassemia, and major vessels for heart disease.
Conclusion
This research introduced a privacy-preserving AI framework for predicting risks of both diabetes and heart disease using Random Forest ensemble learning. The system achieved strong results: the diabetes model reached 75.97% accuracy with a ROC-AUC of 0.813, while the heart disease model performed even better with 81.67% accuracy and a ROC-AUC of 0.947. Cross-validation confirmed the stability of these outcomes, and feature importance analysis provided interpretability by highlighting clinically relevant risk factors.
By integrating dual-disease prediction into a single system and ensuring that all data is processed locally, the framework balances predictive reliability with privacy protection. This combination makes the approach practical for real-world preventive healthcare, where both accuracy and trust are essential. Ultimately, the study demonstrates how AI-driven analytics can support early detection, empower clinicians, and strengthen decision-making in managing chronic diseases.
References
[1] Breiman, L. “Random Forests.” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
[2] UCI Machine Learning Repository. “Heart Disease Dataset.” University of California, Irvine. Available: https://archive.ics.uci.edu
[3] Kaggle. “PIMA Indians Diabetes Database.” Available: https://www.kaggle.com
[4] World Health Organization. Global Report on Diabetes. WHO Press, Geneva, 2016.
[5] Rajkomar, A., Dean, J., Kohane, I. “Machine Learning in Medicine.” New England Journal of Medicine, vol. 380, no. 14, pp. 1347–1358, 2019.
[6] Deo, R. “Machine Learning in Medicine.” Circulation, vol. 132, no. 20, pp. 1920–1930, 2015.
[7] Topol, E. “High-performance medicine: the convergence of human and artificial intelligence.” Nature Medicine, vol. 25, pp. 44–56, 2019.
[8] Beam, A. L., Kohane, I. S. “Big Data and Machine Learning in Health Care.” JAMA, vol. 319, no. 13, pp. 1317–1318, 2018.
[9] Lundberg, S. M., Lee, S.-I. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems, vol. 30, 2017.
[10] Johnson, A. E. W., et al. “Reproducibility in Machine Learning for Health.” NPJ Digital Medicine, vol. 2, Article 77, 2019.